{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Small, Local Model (IBM Granite) Multi-Agent RAG\n",
    "\n",
    "Can you create effective agents with small, 8B-parameter models? Yes, you can.\n",
    "\n",
    "Small LLMs, like the IBM Granite 8B, are advantageous because they can run locally on a powerful laptop, eliminating the need for API calls to external service providers. But can you create effective agents with them? This answer is \"yes\" - with the right techniques.\n",
    "\n",
    "This notebook demonstrates how to optimize the performance of small language models, such as the IBM Granite 8B, in a Retrieval-Augmented Generation (RAG) workflow. By combining document and web search tools with a high-level planner and reflection-driven agents, it achieves effective, dynamic problem-solving. The notebook also shows how this process can be executed locally using Ollama, with examples provided later.\n",
    "\n",
    "Workflow Overview:\n",
    "\n",
    "1. Plan Creation (Planner):\n",
    "  - The Planner generates an initial plan based on user instructions.\n",
    "\n",
    "2. Plan Execution (Iterative Process):\n",
    "  - The plan is carried out step-by-step through multiple iterations:\n",
    "    - Step Selection: If no prior outputs exist, the first step of the plan is executed. Otherwise, the system evaluates the results of previous steps using the Generic Assistant before advancing.\n",
    "    - Information Retrieval: Each step is executed by the Research Assistant, which performs web and document searches to gather relevant data.\n",
    "\n",
    "3. Reflection and Adjustment (Reflection Assistant):\n",
    " - After each step, the Reflection Assistant evaluates its success and determines how to adapt the plan, if needed. This ensures the workflow remains flexible and responsive to intermediate results.\n",
    "\n",
    "4. Final Output:\n",
    " - The accumulated contextual information is synthesized into a final response, directly addressing the user’s original query.\n",
    "\n",
    "By leveraging these tactics, this notebook showcases how smaller models, equipped with effective planning and reflection mechanisms, can deliver impactful results even on resource-constrained systems.\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Requirements\n",
    "\n",
    "Ensure the autogen python package is installed, along with a few LangChain libraries for retrieval.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install -U ag2[openai] ag2[ollama] langchain_community langchain-milvus langchain_huggingface"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Embeddings Model\n",
    "\n",
    "Here, we need to download an embeddings model for the vector database we will be setting up for document retrieval. In this example, we are using the IBM Granite embeddings model. It comes in a few different flavors, which can be found on [Hugging Face](https://huggingface.co/ibm-granite). You can also substitute any embeddings model of your choice."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_huggingface import HuggingFaceEmbeddings\n",
    "\n",
    "embeddings_model = HuggingFaceEmbeddings(model_name=\"ibm-granite/granite-embedding-30m-english\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Establish the vector database\n",
    "\n",
    "In this example, we are using Milvus as our vector database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tempfile\n",
    "\n",
    "from langchain_milvus import Milvus\n",
    "\n",
    "db_file = tempfile.NamedTemporaryFile(prefix=\"milvus_\", suffix=\".db\", delete=False).name  # noqa: SIM115\n",
    "print(f\"The vector database will be saved to {db_file}\")\n",
    "\n",
    "vector_db = Milvus(\n",
    "    embedding_function=embeddings_model,\n",
    "    connection_args={\"uri\": db_file},\n",
    "    auto_id=True,\n",
    "    enable_dynamic_field=True,\n",
    "    index_params={\"index_type\": \"AUTOINDEX\"},\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Identify documents to ingest\n",
    "\n",
    "In the following code, you provide a list of folders and document extensions that you would like to ingest into your vector database for future retrieval.\n",
    "Be sure to personalize the identified directories and file extensions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "from langchain.document_loaders import TextLoader\n",
    "from langchain.text_splitter import CharacterTextSplitter\n",
    "\n",
    "# Specify the file extensions you would like to include (e.g., '.txt', '.md')\n",
    "allowed_extensions = {\".txt\", \".md\"}\n",
    "\n",
    "# Update this to include the directories you want to scan\n",
    "source_directories = [\n",
    "    Path(\"~/Downloads/\").expanduser(),\n",
    "    Path(\"~/Documents/\").expanduser(),  # Add more directories as needed\n",
    "]\n",
    "\n",
    "# Collect files from all specified directories\n",
    "sources = []\n",
    "for source_directory in source_directories:\n",
    "    sources.extend(\n",
    "        file\n",
    "        for file in source_directory.glob(\"**/*\")  # Includes all files and subdirectories recursively\n",
    "        if file.is_file() and file.suffix in allowed_extensions  # Include only files with specific extensions\n",
    "    )\n",
    "\n",
    "# Load and process the files\n",
    "documents = []\n",
    "for file in sources:\n",
    "    loader = TextLoader(file)  # Initialize TextLoader for each file\n",
    "    documents.extend(loader.load())  # Load and extend the documents list\n",
    "\n",
    "# Split the documents into chunks\n",
    "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
    "texts = text_splitter.split_documents(documents)\n",
    "\n",
    "# Now `texts` contains the processed and split documents"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ingest the documents\n",
    "\n",
    "Finally, after chunking the documents, load them into the vector database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_db.add_documents(texts)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Agent Prompts\n",
    "\n",
    "The following are the prompts for the agents that we will define later. Try running with these prompts initially, and update them later if needed for your use case."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "PLANNER_MESSAGE = \"\"\"You are a task planner. You will be given some information your job is to think step by step and enumerate the steps to complete a performance assessment of a given user, using the provided context to guide you.\n",
    "    You will not execute the steps yourself, but provide the steps to a helper who will execute them. Make sure each step consists of a single operation, not a series of operations. The helper has the following capabilities:\n",
    "    1. Search through a collection of documents provided by the user. These are the user's own documents and will likely not have latest news or other information you can find on the internet.\n",
    "    2. Synthesize, summarize and classify the information received.\n",
    "    3. Search the internet\n",
    "    Please output the step using a properly formatted python dictionary and list. It must be formatted exactly as below:\n",
    "    ```{\"plan\": [\"Step 1\", \"Step 2\"]}```\n",
    "\n",
    "    Respond only with the plan json with no additional fields and no additional text. Here are a few examples:\n",
    "    Example 1:\n",
    "    User query: Write a performance self-assessment for Joe, consisting of a high-level overview of achievements for the year, a listing of the business impacts for each of these achievements, a list of skills developed and ways he's collaborated with the team.\n",
    "    Your response:\n",
    "    ```{\"plan\": [\"Query documents for all contributions involving Joe this year\", \"Quantify the business impact for Joe's contributions\", \"Enumerate the skills Joe has developed this year\", \"List several examples of how Joe's work has been accomplished via team collaboration\", \"Formulate the performance review based on collected information\"]}```\n",
    "\n",
    "    Example 2:\n",
    "    User query: Find the latest news about the technologies I'm working on.\n",
    "    Your response:\n",
    "    ```{\"plan\": [\"Query documents for technologies used\", \"Search the internet for the latest news about each technology\"]}```\n",
    "    \"\"\"\n",
    "\n",
    "ASSISTANT_PROMPT = \"\"\"You are an AI assistant.\n",
    "    When you receive a message, figure out a solution and provide a final answer. The message will be accompanied with contextual information. Use the contextual information to help you provide a solution.\n",
    "    Make sure to provide a thorough answer that directly addresses the message you received.\n",
    "    The context may contain extraneous information that does not apply to your instruction. If so, just extract whatever is useful or relevant and use it to complete your instruction.\n",
    "    When the context does not include enough information to complete the task, use your available tools to retrieve the specific information you need.\n",
    "    When you are using knowledge and web search tools to complete the instruction, answer the instruction only using the results from the search; do no supplement with your own knowledge.\n",
    "    Be persistent in finding the information you need before giving up.\n",
    "    If the task is able to be accomplished without using tools, then do not make any tool calls.\n",
    "    When you have accomplished the instruction posed to you, you will reply with the text: ##SUMMARY## - followed with an answer.\n",
    "    If you are using knowledge and web search tools, make sure to provide the URL for the page you are using as your source or the document name.\n",
    "    Important: If you are unable to accomplish the task, whether it's because you could not retrieve sufficient data, or any other reason, reply only with ##TERMINATE##.\n",
    "\n",
    "    # Tool Use\n",
    "    You have access to the following tools. Only use these available tools and do not attempt to use anything not listed - this will cause an error.\n",
    "    Respond in the format: <function_call> {\"name\": function name, \"arguments\": dictionary of argument name and its value}. Do not use variables.\n",
    "    Only call one tool at a time.\n",
    "    When suggesting tool calls, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.\n",
    "    \"\"\"\n",
    "\n",
    "REFLECTION_ASSISTANT_PROMPT = \"\"\"You are an assistant. Please tell me what is the next step that needs to be taken in a plan in order to accomplish a given task.\n",
    "    You will receive json in the following format, and will respond with a single line of instruction.\n",
    "\n",
    "    {\n",
    "        \"Goal\": The original query from the user. Every time you create a reply, it must be guided by the task of fulfilling this goal. Do not veer off course.,\n",
    "        \"Plan\": An array that enumerates every step of the plan,\n",
    "        \"Previous Step\": The step taken immediately prior to this message.\n",
    "        \"Previous Output\": The output generated by the last step taken.\n",
    "        \"Steps Taken\": A sequential array of steps that have already been executed prior to the last step,\n",
    "\n",
    "    }\n",
    "\n",
    "    Instructions:\n",
    "        1. If the very last step of the plan has already been executed, or the goal has already been achieved regardless of what step is next, then reply with the exact text: ##TERMINATE##\n",
    "        2. Look at the \"Previous Step\". If the previous step was not successful and it is integral to achieving the goal, think of how it can be retried with better instructions. Inspect why the previous step was not successful, and modify the instruction to find another way to achieve the step's objective in a way that won't repeat the same error.\n",
    "        3. If the last previous was successful, determine what the next step will be. Always prefer to execute the next sequential step in the plan unless the previous step was unsuccessful and you need to re-run the previous step using a modified instruction.\n",
    "        4. When determining the next step, you may use the \"Previous Step\", \"Previous Output\", and \"Steps Taken\" to give you contextual information to decide what next step to take.\n",
    "\n",
    "    Be persistent and resourceful to make sure you reach the goal.\n",
    "    \"\"\"\n",
    "\n",
    "CRITIC_PROMPT = \"\"\"The previous instruction was {last_step} \\nThe following is the output of that instruction.\n",
    "    if the output of the instruction completely satisfies the instruction, then reply with ##YES##.\n",
    "    For example, if the instruction is to list companies that use AI, then the output contains a list of companies that use AI.\n",
    "    If the output contains the phrase 'I'm sorry but...' then it is likely not fulfilling the instruction. \\n\n",
    "    If the output of the instruction does not properly satisfy the instruction, then reply with ##NO## and the reason why.\n",
    "    For example, if the instruction was to list companies that use AI but the output does not contain a list of companies, or states that a list of companies is not available, then the output did not properly satisfy the instruction.\n",
    "    If it does not satisfy the instruction, please think about what went wrong with the previous instruction and give me an explanation along with the text ##NO##. \\n\n",
    "    Previous step output: \\n {last_output}\"\"\"\n",
    "\n",
    "SEARCH_TERM_ASSISTANT_PROMPT = \"\"\"You are an expert at creating precise, complete, and accurate web search queries. When given a description of what a user is looking for, you will generate a fully formed, optimized search query that can be used directly in a search engine to find the most relevant information.\n",
    "\n",
    "    Key Requirements:\n",
    "\n",
    "        Stick to the Description: Use only the information explicitly provided in the description. Do not add, assume, or invent details that are not stated.\n",
    "        Be Complete and Concise: The search query must contain all necessary keywords and phrases required to fulfill the description without being unnecessarily verbose.\n",
    "        Avoid Vague or Placeholder Terms: Do not include incomplete terms (e.g., no placeholder variables or references to unspecified concepts).\n",
    "        Use Proper Context and Refinement: Include context, if applicable (e.g., location, date, format). Utilize search modifiers like quotes, \"site:\", \"filetype:\", or Boolean operators (AND, OR, NOT) to refine the query when appropriate.\n",
    "        Avoid Hallucination: Do not make up or fabricate any details that are not explicitly stated in the description.\n",
    "\n",
    "    Example Input:\n",
    "    \"Find the latest research papers about AI-driven medical imaging published in 2023.\"\n",
    "\n",
    "    Example Output:\n",
    "    \"latest research papers on AI-driven medical imaging 2023\"\n",
    "\n",
    "    Another Example Input:\n",
    "    \"Find a website that lists the top restaurants in Paris with outdoor seating.\"\n",
    "\n",
    "    Example Output:\n",
    "    \"top restaurants in Paris with outdoor seating\"\n",
    "\n",
    "    Incorrect Example Input:\n",
    "    \"Find the population of Atlantis.\"\n",
    "\n",
    "    Incorrect Example Output:\n",
    "    \"population of Atlantis 2023\" (This is incorrect because the existence or details about Atlantis are not explicitly stated in the input and must not be assumed.)\n",
    "\n",
    "    Your Turn:\n",
    "    Generate a complete, accurate, and optimized search query based on the description provided below:\n",
    "    \"\"\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ollama Setup\n",
    "\n",
    "If you are running inferencing locally and do not have Ollama setup already, use the following commands to do so."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install ollama\n",
    "! ollama serve"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pull the Ollama model\n",
    "\n",
    "If you are using Ollama, you will need to pull the relevant LLM. In this example we are using the Granite 3.1 dense model at 8b parameters. It is capable enough to perform well in this flow but small enough to run on a beefy personal machine.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! ollama pull granite3.1-dense:8b"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configuration Variables\n",
    "\n",
    "Customize the following LLM configuration variables as needed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Ollama URL\n",
    "base_url = \"http://localhost:11434\"\n",
    "\n",
    "# API key for LLM Inferencing. This default value  can be used when running Ollama locally.\n",
    "api_key = \"ollama\"\n",
    "\n",
    "# LLM to use for all agents. The value below corresponds to its name in Ollama.\n",
    "default_model = \"granite3.1-dense:8b\"\n",
    "\n",
    "# Model temperature. A lower temperature gives more predictable results.\n",
    "model_temp = 0\n",
    "\n",
    "# Maximum number of steps that are allowed to be executed in a plan (prevents a never-ending loop)\n",
    "max_plan_steps = 6"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## AG2 Agent Setup\n",
    "\n",
    "Initializes the LLM config and all of the agents of the workflow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from autogen import ConversableAgent, LLMConfig, coding\n",
    "\n",
    "##################\n",
    "# AG2 Config\n",
    "##################\n",
    "# LLM Config\n",
    "llm_config = LLMConfig(\n",
    "    {\n",
    "        \"model\": default_model,\n",
    "        \"client_host\": base_url,\n",
    "        \"api_type\": \"ollama\",\n",
    "        \"cache_seed\": None,\n",
    "        \"price\": [0.0, 0.0],\n",
    "    },\n",
    "    temperature=model_temp,\n",
    ")\n",
    "\n",
    "# Generic Assistant - Used for general inquiry. Does not call tools.\n",
    "generic_assistant = ConversableAgent(name=\"Generic_Assistant\", llm_config=llm_config, human_input_mode=\"NEVER\")\n",
    "\n",
    "# Search Term Assistant - Used for finding relevant search terms for a user's query\n",
    "web_search_assistant = ConversableAgent(\n",
    "    name=\"Web_Search_Term_Assistant\",\n",
    "    system_message=SEARCH_TERM_ASSISTANT_PROMPT,\n",
    "    llm_config=llm_config,\n",
    "    human_input_mode=\"NEVER\",\n",
    ")\n",
    "\n",
    "# Provides the initial high level plan\n",
    "planner = ConversableAgent(\n",
    "    name=\"Planner\", system_message=PLANNER_MESSAGE, llm_config=llm_config, human_input_mode=\"NEVER\"\n",
    ")\n",
    "\n",
    "# The assistant agent is responsible for executing each step of the plan, including calling tools\n",
    "assistant = ConversableAgent(\n",
    "    name=\"Research_Assistant\",\n",
    "    system_message=ASSISTANT_PROMPT,\n",
    "    llm_config=llm_config,\n",
    "    human_input_mode=\"NEVER\",\n",
    "    is_termination_msg=lambda msg: \"tool_response\" not in msg and msg[\"content\"] == \"\",\n",
    ")\n",
    "\n",
    "# Critques the output of other agents. Prompt will be fed in for each call.\n",
    "critic = ConversableAgent(name=\"Critic\", llm_config=llm_config, human_input_mode=\"NEVER\")\n",
    "\n",
    "# Summary agent clarifies another agent's reply in context to the original instruction. Prompt will be fed in for each call.\n",
    "summary_agent = ConversableAgent(name=\"SummaryAssistant\", llm_config=llm_config, human_input_mode=\"NEVER\")\n",
    "\n",
    "# Reflection Assistant: Reflect on plan progress and give the next step\n",
    "reflection_assistant = ConversableAgent(\n",
    "    name=\"ReflectionAssistant\",\n",
    "    system_message=REFLECTION_ASSISTANT_PROMPT,\n",
    "    llm_config=llm_config,\n",
    "    human_input_mode=\"NEVER\",\n",
    ")\n",
    "\n",
    "# User Proxy chats with assistant on behalf of user and executes tools\n",
    "code_exec = coding.LocalCommandLineCodeExecutor(\n",
    "    timeout=10,\n",
    "    work_dir=\"code_exec\",\n",
    ")\n",
    "user_proxy = ConversableAgent(\n",
    "    name=\"User\",\n",
    "    human_input_mode=\"NEVER\",\n",
    "    code_execution_config={\"executor\": code_exec},\n",
    "    is_termination_msg=lambda msg: \"##SUMMARY##\" in msg[\"content\"]\n",
    "    or \"## Summary\" in msg[\"content\"]\n",
    "    or \"##TERMINATE##\" in msg[\"content\"]\n",
    "    or (\"tool_calls\" not in msg and msg[\"content\"] == \"\"),\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Document Search Tool\n",
    "\n",
    "The following function is registered as a tool for searching the previously created vector database for relevant documents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Annotated\n",
    "\n",
    "\n",
    "@assistant.register_for_llm(\n",
    "    name=\"personal_knowledge_search\", description=\"Searches personal documents according to a given query\"\n",
    ")\n",
    "@user_proxy.register_for_execution(name=\"personal_knowledge_search\")\n",
    "def do_knowledge_search(search_instruction: Annotated[str, \"search instruction\"]) -> str:\n",
    "    \"\"\"Given an instruction on what knowledge you need to find, search the user's documents for information particular to them, their projects, and their domain.\n",
    "    This is simple document search, it cannot perform any other complex tasks.\n",
    "    This will not give you any results from the internet. Do not assume it can retrieve the latest news pertaining to any subject.\"\"\"\n",
    "    if not search_instruction:\n",
    "        return \"Please provide a search query.\"\n",
    "\n",
    "    messages = \"\"\n",
    "    # docs = vector_db.similarity_search(search_instruction)\n",
    "    retriever = vector_db.as_retriever(search_kwargs={\"fetch_k\": 10, \"max_tokens\": 500})\n",
    "\n",
    "    docs = retriever.invoke(search_instruction)\n",
    "    print(f\"{len(docs)} documents returned\")\n",
    "    for d in docs:\n",
    "        print(d)\n",
    "        print(d.page_content)\n",
    "        messages += d.page_content + \"\\n\"\n",
    "\n",
    "    return messages"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Web Search Tool\n",
    "\n",
    "The following is a tool that can be registered for web search usage.\n",
    "For a quick and easy demonstration, the implementation below uses the `googlesearch-python` library to perform the search, which is not officially supported by Google.\n",
    "If you require the web search tool beyond a PoC, you should use the official library and APIs of your favorite search provider, or a multi-engine service such as Tavily."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install googlesearch-python"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import traceback\n",
    "from datetime import date\n",
    "from typing import Annotated\n",
    "\n",
    "from googlesearch import search\n",
    "\n",
    "\n",
    "@assistant.register_for_llm(name=\"web_search\", description=\"Searches the web according to a given query\")\n",
    "@user_proxy.register_for_execution(name=\"web_search\")\n",
    "def do_web_search(\n",
    "    search_instruction: Annotated[\n",
    "        str,\n",
    "        \"Provide a detailed search instruction that incorporates specific features, goals, and contextual details related to the query. \\\n",
    "                                                Identify and include relevant aspects from any provided context, such as key topics, technologies, challenges, timelines, or use cases. \\\n",
    "                                                Construct the instruction to enable a targeted search by specifying important attributes, keywords, and relationships within the context.\",\n",
    "    ],\n",
    ") -> str:\n",
    "    \"\"\"This function is used for searching the web for information that can only be found on the internet, not in the users personal notes.\"\"\"\n",
    "    if not search_instruction:\n",
    "        return \"Please provide a search query.\"\n",
    "\n",
    "    # First, we convert the incoming query into a search term.\n",
    "    today = date.today().strftime(\"%Y-%m-%d\")\n",
    "\n",
    "    chat_result = user_proxy.initiate_chat(\n",
    "        recipient=web_search_assistant,\n",
    "        message=\"Today's date is \" + today + \". \" + search_instruction,\n",
    "        max_turns=1,\n",
    "    )\n",
    "    summary = chat_result.chat_history[-1][\"content\"]\n",
    "\n",
    "    results = []\n",
    "\n",
    "    try:\n",
    "        response = search(summary, advanced=True)\n",
    "        for result in response:\n",
    "            entry = {}\n",
    "            if type(result) is not str:\n",
    "                entry[\"title\"] = result.title\n",
    "                entry[\"url\"] = result.url\n",
    "                entry[\"description\"] = result.description\n",
    "                results.append(entry)\n",
    "            else:\n",
    "                results.append(result)\n",
    "\n",
    "    except Exception as e:\n",
    "        print(e)\n",
    "        print(traceback.format_exc())\n",
    "        return f\"Unable to execute search query due to the following exception: {e}\"\n",
    "\n",
    "    return str(results)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plan parser\n",
    "\n",
    "When the initial plan is formed, the LLM should output the plan in JSON format. The following parses the response from the planner and returns the response as a dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from typing import Any\n",
    "\n",
    "\n",
    "def parse_response(message: str) -> dict[str, Any]:\n",
    "    \"\"\"\n",
    "    Parse the response from the planner and return the response as a dictionary.\n",
    "    \"\"\"\n",
    "    # Parse the response content\n",
    "    json_response = {}\n",
    "    # if message starts with ``` and ends with ``` then remove them\n",
    "    if message.startswith(\"```\"):\n",
    "        message = message[3:]\n",
    "    if message.endswith(\"```\"):\n",
    "        message = message[:-3]\n",
    "    if message.startswith(\"json\"):\n",
    "        message = message[4:]\n",
    "    if message.startswith(\"python\"):\n",
    "        message = message[6:]\n",
    "    message = message.strip()\n",
    "    try:\n",
    "        json_response: dict[str, Any] = json.loads(message)\n",
    "    except Exception as e:\n",
    "        # If the response is not a valid JSON, try pass it using string matching.\n",
    "        # This should seldom be triggered\n",
    "        print(\n",
    "            f'LLM response was not properly formed JSON. Will try to use it as is. LLM response: \"{message}\". Error: {e}'\n",
    "        )\n",
    "        message = message.replace(\"\\\\n\", \"\\n\")\n",
    "        message = message.replace(\"\\n\", \" \")  # type: ignore\n",
    "        if \"plan\" in message and \"next_step\" in message:\n",
    "            start = message.index(\"plan\") + len(\"plan\")\n",
    "            end = message.index(\"next_step\")\n",
    "            json_response[\"plan\"] = message[start:end].replace('\"', \"\").strip()\n",
    "\n",
    "    return json_response"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Workflow\n",
    "\n",
    "Here is the meat of the agentic workflow.\n",
    "They key point here is that extract maximum performance a set of agents that are called in sequence, and the data passed between them is curated and trimmed, rather than passing around the full chat history."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#########################\n",
    "# Begin Agentic Workflow\n",
    "#########################\n",
    "\n",
    "\n",
    "def run_agentic_workflow(user_message: str) -> str:\n",
    "    \"\"\"\n",
    "    Run the agentic workflow.\n",
    "    \"\"\"\n",
    "\n",
    "    # Make a plan\n",
    "    raw_plan = user_proxy.initiate_chat(message=user_message, max_turns=1, recipient=planner).chat_history[-1][\n",
    "        \"content\"\n",
    "    ]\n",
    "    plan_dict = parse_response(raw_plan)\n",
    "\n",
    "    # Start executing plan\n",
    "    answer_output = []  # This variable tracks the output of previous successful steps as context for executing the next step\n",
    "    steps_taken = []  # A list of steps already executed\n",
    "    last_output = \"\"  # Output of the single previous step gets put here\n",
    "    last_step = \"\"\n",
    "\n",
    "    for _ in range(max_plan_steps):\n",
    "        if last_output == \"\":\n",
    "            # This is the first step of the plan since there's no previous output\n",
    "            instruction = plan_dict[\"plan\"][0]\n",
    "        else:\n",
    "            # Previous steps in the plan have already been executed.\n",
    "            reflection_message = last_step\n",
    "            # Ask the critic if the previous step was properly accomplished\n",
    "            was_job_accomplished = user_proxy.initiate_chat(\n",
    "                recipient=critic,\n",
    "                max_turns=1,\n",
    "                message=CRITIC_PROMPT.format(last_step=last_step, last_output=last_output),\n",
    "            ).chat_history[-1][\"content\"]\n",
    "            # If it was not accomplished, make sure an explanation is provided for the reflection assistant\n",
    "            if \"##NO##\" in was_job_accomplished:\n",
    "                reflection_message = f\"The previous step was {last_step} but it was not accomplished satisfactorily due to the following reason: \\n {was_job_accomplished}.\"\n",
    "\n",
    "            # Then, ask the reflection agent for the next step\n",
    "            message = {\n",
    "                \"Goal\": user_message,\n",
    "                \"Plan\": str(plan_dict),\n",
    "                \"Last Step\": reflection_message,\n",
    "                \"Last Step Output\": str(last_output),\n",
    "                \"Steps Taken\": str(steps_taken),\n",
    "            }\n",
    "            instruction = user_proxy.initiate_chat(\n",
    "                recipient=reflection_assistant, max_turns=1, message=str(message)\n",
    "            ).chat_history[-1][\"content\"]\n",
    "\n",
    "            # Only append the previous step and its output to the record if it accomplished its task successfully.\n",
    "            # It was found that storing information about unsuccessful steps causes more confusion than help to the agents\n",
    "            if \"##NO##\" not in was_job_accomplished:\n",
    "                answer_output.append(last_output)\n",
    "                steps_taken.append(last_step)\n",
    "\n",
    "            if \"##TERMINATE##\" in instruction:\n",
    "                # A termination message means there are no more steps to take. Exit the loop.\n",
    "                break\n",
    "\n",
    "        # Now that we have determined the next step to take, execute it\n",
    "        prompt = instruction\n",
    "        if answer_output:\n",
    "            prompt += f\"\\n Contextual Information: \\n{answer_output}\"\n",
    "        output = user_proxy.initiate_chat(recipient=assistant, max_turns=3, message=prompt)\n",
    "\n",
    "        # Sort through the chat history and extract out replies from the assistant (We don't need the full results of the tool calls, just the assistant's summary)\n",
    "        previous_output = []\n",
    "        for chat_item in output.chat_history:\n",
    "            if chat_item[\"content\"] and chat_item[\"name\"] == \"Research_Assistant\":\n",
    "                previous_output.append(chat_item[\"content\"])\n",
    "\n",
    "        # It was found in testing that the output of the assistant will often contain the right information, but it will not be formatted in a manner that directly answers the instruction\n",
    "        # Therefore, the summary assistant will take the assistant's output and reformat it to more directly answer the instruction that was given to the assistant\n",
    "        summary_output = user_proxy.initiate_chat(\n",
    "            recipient=summary_agent,\n",
    "            max_turns=1,\n",
    "            message=f\"The instruction is: {instruction} Please directly answer the instruction given the following data: {previous_output}\",\n",
    "        )\n",
    "\n",
    "        # The previous instruction and its output will be recorded for the next iteration to inspect before determining the next step of the plan\n",
    "        last_output = summary_output.chat_history[-1][\"content\"]\n",
    "        last_step = instruction\n",
    "\n",
    "    # Now that we've gathered all the information we need, we will summarize it to directly answer the original prompt\n",
    "    final_prompt = (\n",
    "        f\"Answer the user's query: {user_message}. Using the following contextual information only: {answer_output}\"\n",
    "    )\n",
    "    final_output = user_proxy.initiate_chat(\n",
    "        message=final_prompt, max_turns=1, recipient=generic_assistant\n",
    "    ).chat_history[-1][\"content\"]\n",
    "\n",
    "    return final_output"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run a query\n",
    "\n",
    "Below is an example query. You can replace it with your own query based upon the types of information you have loaded into the vector database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "answer = run_agentic_workflow(\n",
    "    user_message=\"Identify the key technical features of my projects and for each feature, fetch me the latest news articles in the tech industry related to that feature.\"\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "front_matter": {
   "description": "Optimizing Small, Local LLMs in Multi-Agent RAG Workflows using IBM Granite, Document Retrieval, Web Search, and Ollama",
   "tags": [
    "Small LLMs",
    "RAG",
    "Web Search",
    "IBM Granite",
    "Ollama",
    "Planning",
    "Reflection"
   ]
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
